The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
نویسندگان
چکیده
We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the optimal policy in the classic bandit problem, and outperforms many learning policies in the correlated case.
منابع مشابه
ADAPTIVE FUZZY TRACKING CONTROL FOR A CLASS OF PERTURBED NONLINEARLY PARAMETERIZED SYSTEMS USING MINIMAL LEARNING PARAMETERS ALGORITHM
In this paper, an adaptive fuzzy tracking control approach is proposed for a class of single-inputsingle-output (SISO) nonlinear systems in which the unknown continuous functions may be nonlinearlyparameterized. During the controller design procedure, the fuzzy logic systems (FLS) in Mamdani type are applied to approximate the unknown continuous functions, and then, based on the minimal learnin...
متن کاملDesigning stable neural identifier based on Lyapunov method
The stability of learning rate in neural network identifiers and controllers is one of the challenging issues which attracts great interest from researchers of neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of ...
متن کاملFuzzy adaptive tracking control for a class of nonlinearly parameterized systems with unknown control directions
This paper addresses the problem of adaptive fuzzy tracking control for aclass of nonlinearly parameterized systems with unknown control directions.In this paper, the nonlinearly parameterized functions are lumped into the unknown continuous functionswhich can be approximated by using the fuzzy logic systems (FLS) in Mamdani type. Then, the Nussbaum-type function is used to de...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Operations Research
دوره 60 شماره
صفحات -
تاریخ انتشار 2012